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ABSTRACT 

We study the possibility to employ neural networks to simulate jet cluster- 
ing procedures in high energy hadron-hadron collisions. We concentrate our 
analysis on the Fermilab Tevatron energy and on the k± algorithm. We em- 
ploy both supervised and unsupervised neural networks. In the first case we 
consider a multilayer feed-forward network trained by the backpropagation al- 
gorithm: our results show that these networks can satisfactorily simulate the 
relevant features of the k± algorithm. We consider also unsupervised learn- 
ing, where the neural network autonomously organizes the events in clusters. 
The results of this analysis are discussed and compared with the supervised 
approach. 



1 Introduction 



Neural Networks (NN) are steadily becoming a standard method of analysis in high energy 
physics. Numerical simulations based on the most common montecarlo codes have been 
implemented to study a number of effects such as discrimination between gluon and 
quark jets in high energy e + e~ collisions H; bb versus light qq production at Z° peak 
IH 0; Higgs particle search at future colliders @]||, to give only a few examples (for a 
review see Hardware implementations are becoming fashionable as well |7|] and they 
might offer a clue to difficult technical problems arising in high energy, high luminosity 
future colliders since they might provide on-line triggers for data acquisition in demanding 
experimental environments. In the present letter we wish to address the problem of the 
simulation of jet-finding algorithms in high energy hadron hadron collisions. Intuitively 
a jet is a collimated spray of energetic particles that, when arising from hard parton 
parton scattering, can shed light on the short distance QCD dynamics. This intuitive 
definition has to be specified for more detailed, quantitative analysis. The first attempt 
in this direction has been represented by the JADE algorithm || for jet definition in e + e~ 
scattering; it introduces a resolution variable d^' = 2EiEj(l — cosfly) for each pair of 
particles (jets), having energies E iy Ej, with angular separation Oij. Once scaled by the 
total energy: y^ = d\j /Q 2 , this distance is compared to a given threshold parameter 
y cut and the pair belongs to the same jet provided that y^ < y cut . 

This first jet definition has evolved into a more sophisticated jet algorithm, the so- 
called k± algorithm ||, that we will briefly review in the next section. The introduction 
of this algorithm allows to solve some of the problems found in older algorithms, such 
as the attractive kinematic correlation of soft particles induced by the JADE algorithm 
or the jets overlap in the Cone algorithm for hadron hadron scattering [JTO]] []; moreover 



the clustering algorithm has a cleaner theoretical foundation [IT] and clear advantages 
in the small y cut region, since it allows resummation at all orders in a s of large double 
logarithmic corrections arising from soft collinear gluon emissions. 

k± clustering algorithm is in general slow and time consuming especially in hadron 
hadron collisions, where one has to separate jets arising from hard parton scattering from 
the soft jets associated to the two initial beams, and for very high energies, because of the 
high multiplicity associated to this scattering. Therefore it may be worthwhile to study 
the feasibility either to simulate the k± algorithm by a supervised neural network or to 
implement an unsupervised NN wich finds its own way to cluster the particles. These two 
approaches will be examined in section 3 and 4 and will be applied to the event-by-event 
analysis of the number of jet. Finally in section 5 we draw our conclusions. 



2 k± clustering algorithm. 

k± clustering algorithm, as applied to e + e~ collisions, uses the following resolution variable 

4 ±} = 2 min{£ 2 , £ 2 }(1 - cos%) (2.1) 

1 For a nice review of the competitive advantages of the k± algorithm over JADE or Snowmass definition 
of jet, see [ pd| . 
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and yij = d^j/Q 2 , to be compared to the resolution parameter y cut . When applied to 
hadron hadron scattering, the algorithm merges a final state particle i into the jet j or 
attributes it to the beam remnants (beam jet), depending on the smaller value between 



dij = 2 min{^, E^[th - Vj ) 2 + (0* - <P,) 2 (2.2) 

and 

diB = ■ (2.3) 

Here E?% is the transverse energy of the z-th particle with respect to the beam direction, 
r)i = lntan(6 l /2) is its pseudorapidity, 0, is the azimuth angle with respect to the beam 
axis. The jet variables are obtained from the jet 4-momentum pj which is defined by 

J# = £rf. (2-4) 

where the sum runs over all the particles in the jet J. 

In order to separate the beam remnants from the hard parton jets one usually examines 
final state particles twice: in the first step a rather large value of y cut (y cu t ~ 1) and 
dij/Q lardy diB / Q lard are compared. Qhard is a reference mass (typical values of Qhard are 
around 10 2 GeV] in our case we use Qhard ~ 55 Gey). Once the attribution of the soft 
remnants to the beams is performed, one again examines the final state by the algorithm, 
with different values of the jet resolution parameter y cut (typical values are 10~ 2 -v- 10 _1 ). 

In the following we shall focus our attention to the event-by-event analysis of rij, the 
number of hard jets and on the average energies of the different jets. In this paper we have 
chosen to work to high, but not very high energies, i.e. we consider the case of Tevatron at 
Fermilab ( y/s = 1.8 TeV) and we defer LHC studies to future analyses. The reason for 
this limitation is practical. We choose to analyze all the final particles arising from hard 
parton scattering; in other words we exclude the beam jets. Since we use all the particles 
of the hard jets we are able to perform more detailed analyses and to use unprocessed 
variables. This implies that we have to consider rather huge neural networks, as it will 
be discussed in more detail in the next sections. At the Tevatron energy, the number of 
final particles originated from hard scattering, n/, may be of the order of 10 2 . We have 
selected only events with rif < 80, which represents more than 70% of the total. 



Our study is based on simulated events produced by the Herwig Montecarlo |L2| . For 
each event, we take as input p x , p y , p z or alternatively ( E, rj, ) for each of the rif final 
particles. 



3 Backpropagation feed-forward neural network sim- 
ulation. 

Our first task is to simulate the k± algorithm by a feed-forward NN trained by the back- 
propagation rule [n||. Backpropagation networks have been extensively applied to high 
energy physics and will not be reviewed here. Suffice it to say that we use a network 
with 240 input neurons, one hidden layer of 100 neurons and 5 or 7 output units according 
to the value of y cut . The input neurons X{ are activated by the momenta p x , p y , p z of 
all the final state particles, ordered with energy; if the final state contains less than 80 
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particles, the corresponding inputs are put equal to zero. The momenta p Xi , p Vi , p Zi are 
normalized in the interval [0, 1]. As usual, also the output neurons Yj have values in the 
interval [0, 1]. More precisely, the output neurons Yj get the following attribution during 
the training: if the event, when analyzed by the k± algorithm, contains % hard jets, then 
we put Yi = 1 and Y& = for k ^ i. In the second phase, the so called testing phase, Yj 
may have any value in the interval [0, 1] and it will be given the value Y = 1 or Y& = 
according to some threshold parameter T h (see below). 

The number of jets depends on the value of y cu t- The training set consists of ~ 40, 000 
events. When studied by the k± algorithm, the average value of hard jets < rij >, for two 
values of y cut) is given by < rij >= 2.9 for y cut = 10~ 2 and < rij >= 2.0 for y cut = 10 _1 . 

For y cut = 10~ 2 most of the events are concentrated at the value rii = 3 (about 40% ), 
with ~ 20% of 4-jet events and 25% 2-jet events; moreover we have a few (~ 5%) events 
with only 1 jet, which can be attributed to imperfect balance of the two beam jets. For 
ycut — 10 _1 , around 75% of the events have 2 hard jets, with the remaining part almost 
equally distributed between 1 and 3 jet events. 

During the testing phase, about 3, 500 events, different from those of the training set, 
have been presented to the NN. We divide the events in classes of assigned number of 
jets, i.e. a given event belongs to the class = 1, 2, ...) if its particles are clustered in 
I jets by the k± algorithm. For each class {1} we can define a purity pf. 

and an efficiency rif 

= t (3 ' 2) 

where iVf is the number of events with / hard jets classified as belonging to the class {/} 
by the NN, while Nf x is the number of events with j (j ^ I) hard jets interpreted as events 
with I jets and N[ is the total number of events with / hard jets (accepted or not). 

We can vary pi and r\\ by modifying an internal parameter of the network, i.e. the 
acceptance parameter T h . It is defined as follows; in the testing phase, the calculated 
output for the neuron i, Yi will be given the value 

Yi = 1 if Yi > 1 - T h 

Yi = if Yi < 1-T h (3.3) 

Typical results are in Fig.l for y cut = 10~ 2 for the events with 1 (Fig. la) and 2 jets 
(Fig. lb). For 3 jets we get purity p^ ~ 0.54 with efficiency 1]% in the range 0.5 -j- 0.7 ; for 
4-jets a purity of 0.43 can be obtained with efficiency 774 ~ 0.4. For y cut = 10 _1 one gets 
better results in terms of purity and efficiency: for example for 1-jet events p\ = 0.98 with 
771 = 0.6 (see Fig.lc) and for 2 jets p 2 ~ 0.91 at r] 2 = 0.6 (see Fig. Id). We have used in 
this analysis (pj) as input variables; had we used ( rji, fa, ) variables as inputs, similar 
results would have been obtained. 



3 



4 Analysis by an unsupervised competitive NN 



In this section we shall make use of unsupervised competitive learning ^ to study the 
feasibility of a neural network that implements a clustering algorithm without preliminary 
supervised training. Among the various NN approaches using unsupervised training, here 
we choose to adopt a self organizing architecture Q. More precisely, we use a single layer 
network with N = 240 neurons in the input layer and an output layer of M neurons. The 
output neurons can be arranged on a square lattice: we have used M ranging from 5 2 to 
20 2 (better results are obtained with larger values of M). At each time step a new event 
x = {xi}(i = 1, . . . , N) is presented as an input to the network and the distance 

N 

X>i-Wi*) 2 (4.1) 

i=l 

for any output neuron k (k = 1, . . . , M) is computed. Here Wik is the element of the 
weight matrix (synaptic matrix) connecting the input neuron i to the output neuron k; 
the values of the weight matrix are chosen initially random and small. Among the output 
neurons let m be the one with the smallest distance from x: 



\x 



W h 



\ 



d m < d k V k= 1,...,M; (4.2) 

in this case the output neuron m becomes the winner and the synapses are modified as 
follows: 



Wij -> Wij + AWtj 
AWij = Vj ( Xi - W i3 ) (4.3) 

In the so-called winner-take-all version of the algorithm one puts rjj = r)5j m , i.e. only the 
weights of the winner neuron {Wi m } are modified; r\ is a positive parameter and the result 
is to shift Wi m towards X{. We have used the self-organizing version of this algorithm, 
with rjj = 7] A(j, m), where A(j, m) is a function peaked at j — m and rapidly decreasing 
with the distance between j and m. This ensure that not only m, but also its neighbours 
change their weights towards x. The result of the updating rule ( |4.3| ) is that, after several 
iterations, W m yields a representation of all the events that have rendered the output 
neuron m the winner. Moreover output neurons that are close in distance have similar 
weights. 

In our case, in principle, the number of the output neurons M could be as small as 3 2 , 
since the number of jets obtained by the k± analysis never excedes 7. In practice, however, 
more neurons are needed since the topologies of the events having the same number of jets 
can widely differ from each other. Once we rotate the events so that the most energetic 
particle is along the positive z axis, this fixes an average direction for the first jet, but the 
other ones can be scattered in any other direction. Therefore more neurons are needed 
to take into account the different kinematical configurations. After several presentations 
(we have used the same training set of 40, 000 events employed in the supervised analysis 
that we have described previously) one can adopt two different strategies to analyze the 
learned weights. 



2 For an introduction to the subject of unsupervised neural networks see[[l4 , chap. 9 



3 For a more detailed description of the self organizing map algorithm see|L5[; for other applications 
in high energy physics see, e. g., ]lq]. 
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I) The events are first analyzed by the k± algorithm; each event after this analysis 
can be, therefore, labelled by an integer number rij, which specifies the jet multiplicity, 
i.e. the number of jets in the event. 

Now let us consider the output neuron to; let us suppose that it has been the winner 
neuron uj[ m ^ times with events having rij = 1 (i.e. with events with 1 jet), uo^ times 
with events having 2 jets, etc. Let uj^ be the largest among the cj^'s: ui^ = oo^ 
such that u\ m ^ > oo^ for any j. We assume a majority rule, i.e. if is the largest 

among the oo^, then the output neuron to is considered representative of the class {/}, 
i.e. it represents the class of the events having I jets. We can now define purity (pi) 
and efficiency (rji) for each class {/} of events by formulae analogous to those of previous 
section. We define 

Pi 
Vi 

Now TV/ 1 has the following definition: 

N? 

where the symbol m\l means that the sum runs over the all the output neurons m which, 
according to the majority rule, represent the class {/}, i.e. are considered representatives 
of the events with / jets. In other terms iV" is obtained by summing all the events with / 
jets, provided they have been accepted, which, in this context, means that they have been 
used to modify the weights of the output neurons of the class {/}. Analogously, 

^ = EE^ (m) (4-7) 

m\l j 

represents the total number of accepted events, i.e. events that have been attributed to 
the class {/}. Finally, as before, iVj, is the total number of events with / jets. 

The results we have obtained by this analysis are as follows. First of all we consider the 
distribution of the events in the output square lattice of the 20 2 neurons. For y cut = 10~ 2 , 
it is given by the Lego plot on Fig. 2. We see clearly that 1 jet events are concentrated in a 
few neurons at the center of the output square, the 3-jets events are mostly concentrated 
at the borders, whereas the neurons representative of the events with 2 jets are in an 
intermediate position. The diagram in Fig. 2 is useful to illustrate the topology of the 
output neurons, but is of no use to get quantitative results. They can be obtained using 
the previous definitions of purity and efficiency for the different classes. Some of these 
results are reported in Table 1. For each class one can get several results for the pair 
(purity, efficiency) by modifying the rule (4.2) as follows: 

d m < min{t, d k } V k = 1, . . . , M ; (4.8) 

where t is an internal parameter; if no output neuron satisfies the previous condition the 
event is discarded. In Table 1 the two columns are obtained with two different values of 
t : t = 0.33 (first column), t = +oo (second column). 

II) Unsupervised competitive neural architectures can be used in a different way; since 

— * 

Wk supplies an internal representation of the patterns that have activated the neuron k, we 



N a , 

Iy tot,l 

E(rr 



(4.4) 
(4.5) 

(4.6) 
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can interpret, for each output neuron k, W k = {Wik} as the distribution of the particle 
momenta of an hypothetical event that we call call event. The events represented by 

— * 

Wk can be analyzed by the k± algorithm. In other terms we can use the network as a 
model of the sample of the physical events that have been used to construct Wik- Since 
the number of the Wik events is M < 20 2 , much smaller than the number of events in 
the original sample (~ 40, 000), it is clear that in this way one can significantly reduce the 
time needed for the analysis. It is also evident that, due to simplifying assumptions (for 
example we have discarded events with more than 80 particles), the results that can be 
obtained by this method are approximated; in other terms in real situations this method 
can be used as a preliminary classifier of events with a given number of jets; after this 
screening of the input data, the events of a particular class of interest might be analyzed 
by the more precise (but time consuming) k±_ algorithm. The results obtained by this 
analysis are as follows. First of all one can compute the average number of jets < nj > 
using the hypothetical events W ik - For y cut = 10 -2 , one obtains: 

<rtj> = 2.84 (<rtj> = 2.9 ) (4.9) 

while for y cut = 10 _i : 

<nj> = 1.75 (<n j > = 2.0 ) (4.10) 

where the values given in parentheses are the results of the k± analysis on the original 
40, 000 events. We can see that the results obtained by the k± algorithm on the hypo- 
thetical Wik events are similar to those obtained analyzing the full sample of the original 
events. 

A similar analysis can be performed on the average energies of the jets. Let us consider 
two groups of events: group A (events with particles clustered in 2-jets) and group B 
(events with particles clustered in 3-jets). We compute, for two values of y cut : y cu t =0.1 
and 0.01 the average energies of the 2 jets of the group A and those of the 3 jets in the 
group B (the jets are ordered in energy). Again we perform two computations, one with 
the physical events, i.e with the original 40, 000 events and the other one with the Wik 
events. Table 2 shows that the results are rather similar. 

5 Conclusions 

Our study shows that neural networks can be usefully employed to simulate jet clustering 
procedures, in particular the k± algorithm, in high energy hadron-hadron collisions. We 
have considered both supervised and unsupervised neural networks. In the first case we 
have used a multilayered feed-forward network trained by the backpropagation algorithm 
and we have shown that this network can satisfactorily simulate the average number of 
jets as a function of y cut . We have also considered unsupervised learning, in particular 
self-organizing competitive neural networks, characterized by autonomous organization of 
the events in clusters. Our results show that the clusterization produced by this network 
has significant similarities with that induced by the k± algorithm. 
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Class of jets 


(Pi , Vi) 


(pi , m) 


{1} = 1 (ljet) 


(0.65 , 0.53) 


(0.52 , 0.48) 


{/} = 2 (2 jets) 


(0.53 , 0.16) 


(0.46 , 0.24) 


{/}=3 (3 jets) 


(0.60 , 0.02) 


(0.46 , 0.87) 


{1} = 4 (4 jets) 


(0.67, 0.001) 


(0.40 , 0.03) 



Table 1: Purity (pi) and efficiency (r)i) pairs for different classes of jets. The first column 
is obtained with t = 0.33, the second column with t = +oo. t is defined in eq. (4.8). 



Ucut 


physical events 


Wik events 


0.1 


0.01 


0.1 


0.01 


A) 1-st jet 
2-nd jet 


174.4 


152.4 


172.4 


170.0 


103.4 


85.8 


58.7 


58.9 










B)l-st jet 

2- nd jet 

3- th jet 


167.8 


149.6 


145.2 


141.9 


90.8 


80.3 


64.9 


52.8 


56.7 


39.6 


33.0 


27.0 



Table 2: Single jet energy average values 
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Figure Caption 



Fig. 1 The purity pi versus efficiency rji for two different values of y cu t and for two values 
of I = rij (number of jets); a: 1 = 1, y cut = fCT 2 ; b: 1 = 2, y cut = 10~ 2 ; c: 
1 = 1, y cut = iCT 1 ; d: I = 2 , y cut = f(H . 

Fig. 2 Distribution of the output neurons (unsupervised architecture) according to their 
jet class (the identification of jet classes refers to k±_ algorithm with y cut = iCT 2 ). 
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